This notebook pulls(if it doesn't already exist) a nifti dataset from OpenNEURO, it then processes the nifti data with nibabel, and finally display the data as a set of images. Included at the end of the demo is a few libraries and ideas for extending this notebook.
We use a few external libraries (documentation for each is in the link):
# This block checks if the openneuro package is installed. If it's not it will try to install the requirements file.
import subprocess
import sys
import os
try:
import openneuro
except:
subprocess.check_call([sys.executable, "-m", "pip", "install", "-r",f"{os.getcwd()}/../requirements.txt"])
pass
Here's a short description of the libraries this notebook uses and what each library does for us.
Openneuro gives us api access to the openneuro data.
Gives us access to fast array manipulation tools.
Scipy is a general scientific computing library we're particularly interested in ndimage the N Dimensional IMAGE module.
Matplotlib is a general plotting library this is what will ultimately render our images.
Nibabel is a nero-imaging data reader. It will open and read our nifti data from our *.nii.gz files.
# External
import openneuro as on
import numpy as np
import scipy.ndimage as ndi
import matplotlib.pyplot as plt
import nibabel as nib
# "Python"
import logging
from pathlib import Path
from IPython.display import display, Markdown, Latex
We now tell python where to put/find our datasets. We define two things here:
DATA_PATH: A string describing the path (unix style) relative to the working directory (the directory the notebook is in). Defined in all caps since I'm considering it a constant.data_path: A Path() python object, this lets us live in an "OS agnostic world". This is defined in lower case since it's a variable defined at interpretation time.I decided to put the data into the same folder as the notebook. There's no particular reason to do that, in fact, if you're writing your own project it's probably a bad choice. A better choice would be the root of the repo.
DATA_PATH = "./data"
data_dir = Path(DATA_PATH)
Next we check if the data is already present. We first check if the parent directory exists, if it doesn't we enforce it's existance.
We then check if the list of all .nii.gz files in our data directory is empty. If the list is empty it means there is no nifti data and we should pull down a dataset from OpenNEURO.
I picked the
Working Memory
and Reward in Children with and without Attention Deficit Hyperactivity Disorder
(ADHD) dataset. I picked this dataset for no particular reason, this code
should "just work" with other datasets. If you look on the OpenNEURO
website you can see:
I don't want to download all 40GB of that data so I use the filtering mechinism in
OpenNeuro_py to get only the first subject's files (the part that says
include="sub-01/*").
Pick an alternate MRI dataset from OpenNEURO and modify the following markdown and python block to document and pull down that dataset.
if not data_dir.exists():
data_dir.mkdir(parents=True)
if not next(data_dir.glob("**/*.nii.gz"), None):
on.download(dataset="ds002424", target_dir=data_dir, include="sub-01/*")
...
👋 Hello! This is openneuro-py 2024.2.0. Great to see you! 🤗
👉 Please report problems 🤯 and bugs 🪲 at
https://github.com/hoechenberger/openneuro-py/issues
🌍 Preparing to download ds002424 …
📁 Traversing directories for ds002424 : 0 entities [00:00, ? entities/s]
📥 Retrieving up to 14 files (5 concurrent downloads). ✅ Finished downloading ds002424. 🧠 Please enjoy your brains.
participants.json: 0%| | 0.00/3.30k [00:00<?, ?B/s]
CHANGES: 0%| | 0.00/291 [00:00<?, ?B/s]
README: 0%| | 0.00/506 [00:00<?, ?B/s]
participants.tsv: 0%| | 0.00/14.2k [00:00<?, ?B/s]
dataset_description.json: 0%| | 0.00/1.62k [00:00<?, ?B/s]
sub-01_ses-T1_task-SLI_bold.nii.gz: 0%| | 0.00/75.3M [00:00<?, ?B/s]
sub-01_ses-T1_task-SLD_events.tsv: 0%| | 0.00/4.88k [00:00<?, ?B/s]
sub-01_ses-T1_T1w.nii.gz: 0%| | 0.00/8.44M [00:00<?, ?B/s]
sub-01_ses-T1_task-SLI_events.tsv: 0%| | 0.00/4.90k [00:00<?, ?B/s]
sub-01_ses-T1_task-SLD_bold.nii.gz: 0%| | 0.00/77.1M [00:00<?, ?B/s]
sub-01_ses-T1_task-SSI_events.tsv: 0%| | 0.00/4.91k [00:00<?, ?B/s]
sub-01_ses-T1_task-SSD_bold.nii.gz: 0%| | 0.00/75.3M [00:00<?, ?B/s]
sub-01_ses-T1_task-SSI_bold.nii.gz: 0%| | 0.00/75.2M [00:00<?, ?B/s]
sub-01_ses-T1_task-SSD_events.tsv: 0%| | 0.00/4.90k [00:00<?, ?B/s]
The read function loads the nifti data from the fname file.
Update this and the following python block as needed for your dataset.
Thanks James for implementing this. Only changes I made were to change the prints to logs.
def read_data(fname):
logging.info("== Opening file with Nibabel ==")
img = nib.load(fname)
logging.info("== Data image from %s has this shape ==" % fname)
logging.info(img.shape)
img.get_data_dtype() == np.dtype(np.int16) # query datatype
# turn raw data into numbers
data = img.get_fdata()
logging.info("== Numpy array from %s has this shape ==" % fname)
logging.info(data.shape)
# and get a header (for metadata)
hdr = img.header
# hdr.get_xyzt_units()
logging.info("== data header ==")
logging.info(hdr.structarr)
return img, data, hdr
The preprocessor function takes the data(numpy array) and "removes" the time component, if one exists.
Update this and the following python block as needed for your dataset.
Again I don't really have enough experience with working with nifti data to know how to do this "properly". I think this works.
def preprocess_data(data):
if len(data.shape) == 4:
data = data[:, :, :, 0]
return data
The processor function takes the data(numpy array) and ...
Update this and the following python block as needed for your dataset.
I'm not doing any real data processing. If I was this is where I would put the code that computes on the data.
def process_data(data):
return data
The display function takes the data(numpy array) and displays three views of the MRI slice samples over each of the spacial coordinates.
The function has optional arguments for number of rows and columns, these change the number of slices to sample from the data.
Update this and the following python block as needed for your dataset.
So I don't really know what I'm doing here. I found some template code and made some minor changes. Probably a good idea to look in more detail.
def display_data(data, fig_rows:int=4, fig_cols:int=4, name:str="MRI"):
n_subplots = fig_rows * fig_cols
n_slice = data.shape[0]
step_size = n_slice // n_subplots
plot_range = n_subplots * step_size
start_stop = int((n_slice - plot_range) / 2)
fig, axs = plt.subplots(fig_rows, fig_cols, figsize=[10, 10])
for idx, img in enumerate(range(start_stop, plot_range, step_size)):
axs.flat[idx].imshow(ndi.rotate(data[img, :, :], 90), cmap='bone')
axs.flat[idx].axis('off')
plt.tight_layout()
display(Markdown(f'## MRI slices in x of {name}'))
plt.show()
n_slice = data.shape[1]
step_size = n_slice // n_subplots
plot_range = n_subplots * step_size
start_stop = int((n_slice - plot_range) / 2)
fig, axs = plt.subplots(fig_rows, fig_cols, figsize=[10, 10])
for idx, img in enumerate(range(start_stop, plot_range, step_size)):
axs.flat[idx].imshow(ndi.rotate(data[:, img, :], 90), cmap='bone')
axs.flat[idx].axis('off')
plt.tight_layout()
display(Markdown(f'## MRI slices in y of {name}'))
plt.show()
n_slice = data.shape[2]
step_size = n_slice // n_subplots
plot_range = n_subplots * step_size
start_stop = int((n_slice - plot_range) / 2)
fig, axs = plt.subplots(fig_rows, fig_cols, figsize=[10, 10])
for idx, img in enumerate(range(start_stop, plot_range, step_size)):
axs.flat[idx].imshow(ndi.rotate(data[:, :, img], 90), cmap='bone')
axs.flat[idx].axis('off')
plt.tight_layout()
display(Markdown(f'## MRI slices in z of {name}'))
plt.show()
The Postprocessor function takes the data(numpy array) and ...
Update this and the following python block as needed for your dataset.
I don't plan on saving the data. If I was this is where I would implement the "clean up" code to format the data for storage.
def postprocess_data(data):
return data
The store function takes the data(numpy array) and ...
Update this and the following python block as needed for your dataset.
I'm not doing any data storage. If I was this is where I would put things like compressing the data and writing to disk.
def store_data(data) -> bool:
return True
We now get to the meat and potatoes of the program. This is where the main set of instructions goes. Here I've implemented the following state machine.
mermaid
stateDiagram-v2
First: foreach nifti file
state First {
[*] --> read
read-->preprocess
preprocess-->Process
Process-->Display
Process --> Postprocess
Postprocess-->store
store --> [*]
}
The ultimate goal here is to hit one of these datasets with mapper. To do that you'll need to:
Suggestion from James: "A very quick suggestion is to simply threshold the raw data and only keep a very sparse subset (i.e. 1% of the raw brain data). That should be enough to get people started . We can then punt the challenge of scaling up to later."
I think this is absolutely the right first steps. Who wants to to be the next person to take a stab at filling out this notebook for Mapper?
for niigz in data_dir.glob("**/*.nii.gz"):
try:
vol, data, hdr = read_data(niigz)
# Do stuff
data = preprocess_data(data)
# Do stuff
data = process_data(data)
# Do stuff
display_data(data,name=str(niigz.stem))
# Do stuff
data = postprocess_data(data)
# Do stuff
store_data(data)
except:
pass
...